Data normalization
Initialization
We start the analysis by initializing the packages required for all the analysis performed in this section. We also define the root directory, within which all the input/output operations for this project will be performed. At the end of this document, a detailed software version information is provided for easier reproducibility of the analysis.
library(DT)
library(tidyverse)
library(data.table)
library(WriteXLS)
library(ggrepel)
library(ggpubr)
library(patchwork)
library(pheatmap)
library(RColorBrewer)
path = "/Users/ashwin/Documents/Projects/YeastScreen/Nonessential_MATa_screen/"Normalization methodology
We use two different normalization strategy to answer specific questions -
- For a given plate \(i\), the Cytoplasmic and Mitochondrial roGFP2 ratios are normalized by the median of all controls in the same plate i.e for cytoplasm - \[NormCyto(i) = \frac{Cyto_i} {median(Ctrl_i)}\] and for mitochondria, \[NormMito(i) = \frac{Mito_i} {median(Ctrl_i)}\]
This type of control based plate normalization will be suitable in comparing the redox levels in across organells and nutrients since the roGFP2 ratios has been commonly normalized to the plate control.
- For a given plate \(i\) with the Cytoplasmic and Mitochondrial roGFP2 ratios \(j\) are normalized by their respective plate specific median roGFP2 ratios i.e for cytoplasm - \[NormCyto_{i} = \frac{Cyto_{ij} - median(Cyto_{i})} {mad(Cyto_{i})}\] and for mitochondria, \[NormMito_{i} = \frac{Mito_{ij} - median(Mito_{i})} {mad(Mito_{i})}\]
This type of organelle specific normalization will be suitable in comparing the redox levels in each organelle across all plates. Since this type of normalization tells us how far are the values from the plate median.
Furthermore, for each normalization strategy, we also summarized the quadruplicated values per gene by taking the median of the scaled values. Due to our outlier removal strategy in the previous section, many NA values were introduced. Each mutant has 4 values, we removed mutants with 2 or more NA values and those with only one NA value was substituted with the median of the other 3 observed values. Next, there were also multiple copies of the same mutants (genes) in either the same plate or different plates. We summarized them by taking the set of quadruplicated or median summarized values which have the maximum absolute median value.
Eventually, we generate the following -
- A table with all the replicate values per organelle and nutrient conditions and the the median summarized value from all plates
- A matrix with gene/mutants in rows and all combinations of organelle, nutrient and replicates (and median summarized value) in the column
Below is the table of cleaned raw data that we will use in our analysis.
rawDatCleaned = readRDS(paste0(path, "data/workspaces/YeastMutantRedox_RawDataCleaned.RDS"))
rawDatCleaned_ctrlfilt = readRDS(paste0(path, "data/workspaces/YeastMutantRedox_RawData_Outliers_and_PoorControlFiltered.RDS"))
normalizeScreenData = function(normMethod)
{
if(normMethod == "fracControl"){
normDat = split(rawDatCleaned_ctrlfilt, as.character(rawDatCleaned_ctrlfilt$Type))
normDat = lapply(normDat, function(x) {
split(x, x$Plate)})
}
if(normMethod == "robustZ"){
normDat = split(rawDatCleaned, rawDatCleaned$Type)
normDat = lapply(normDat, function(x) {
split(x, x$Plate)})
}
normDatReps = normDat
for (i in 1:length(normDat))
{
for (j in 1:length(normDat[[i]]))
{
tmp = droplevels(normDat[[i]][[j]])
if(nrow(tmp) > 0) #since there are no entries from plate 150, was filtered in previous step
{
#---------------------------------------------------------------------------------------------
# Separate Controls, Cytoplasam and Mitochondrial roGFP2 ratios per plate
#---------------------------------------------------------------------------------------------
# Control
tmp.ctrl = droplevels(tmp[tmp$Content == "Control",])
# Cytoplasm
tmp.cyto = droplevels(tmp[tmp$Content == "Cytoplasm",])
# Mitochondria
tmp.mito = droplevels(tmp[tmp$Content == "Mitochondria",])
rm(tmp)
#---------------------------------------------------------------------------------------------
# Compute the corresponding normalization factors (median & median absolute deviation)
#---------------------------------------------------------------------------------------------
# Normalizing factor - Control
nf.ctrl.med = median(tmp.ctrl$roGFP2.ratio, na.rm = T)
# Normalizing factor - Cytoplasm
nf.cyto.med = median(tmp.cyto$roGFP2.ratio, na.rm = T)
nf.cyto.mad = mad(tmp.cyto$roGFP2.ratio, na.rm = T)
# Normalizing factor - Mitochondria
nf.mito.med = median(tmp.mito$roGFP2.ratio, na.rm = T)
nf.mito.mad = mad(tmp.mito$roGFP2.ratio, na.rm = T)
#---------------------------------------------------------------------------------------------
# The two normalization strategies (plate control and median based)
#---------------------------------------------------------------------------------------------
# Normalization 1 - Plate control based
if (normMethod == "fracControl")
{
tmp.cyto$roGFP2.ratio = tmp.cyto$roGFP2.ratio / nf.ctrl.med
tmp.mito$roGFP2.ratio = tmp.mito$roGFP2.ratio / nf.ctrl.med
}
# Normalization 2 - Plate median based
if (normMethod == "robustZ")
{
tmp.cyto$roGFP2.ratio = (tmp.cyto$roGFP2.ratio - nf.cyto.med) / nf.cyto.mad
tmp.mito$roGFP2.ratio = (tmp.mito$roGFP2.ratio - nf.mito.med) / nf.mito.mad
}
rm(nf.ctrl.med,
nf.cyto.med,
nf.cyto.mad,
nf.mito.med,
nf.mito.mad)
#---------------------------------------------------------------------------------------------
# Normalized data with replicates - coverting from long to wide table format
#---------------------------------------------------------------------------------------------
#---Cytoplasm---#
tmp.cyto.repl = tmp.cyto %>%
select(Gene.Symbol, Plate, Group, roGFP2.ratio, Type, Content) %>%
group_by(Gene.Symbol, Group) %>%
mutate(pseurep = paste0("roGFP2_ratio_", 1:n())) %>%
spread(key = pseurep, value = roGFP2.ratio) %>%
ungroup() %>%
select(-Group) %>%
rename(Genes = Gene.Symbol,
Nutrient = Type,
Organelle = Content)
#---Mitochondria---#
tmp.mito.repl = tmp.mito %>%
select(Gene.Symbol, Plate, Group, roGFP2.ratio, Type, Content) %>%
group_by(Gene.Symbol, Group) %>%
mutate(pseurep = paste0("roGFP2_ratio_", 1:n())) %>%
spread(key = pseurep, value = roGFP2.ratio) %>%
ungroup() %>%
select(-Group) %>%
rename(Genes = Gene.Symbol,
Nutrient = Type,
Organelle = Content)
#--Compilation---#
normDatReps[[i]][[j]] = rbind(tmp.cyto.repl, tmp.mito.repl)
#Deleting
rm(tmp.ctrl,
tmp.cyto,
tmp.mito,
tmp.cyto.repl,
tmp.mito.repl)
}
}
rm(j)
}
rm(i)
res = lapply(normDatReps, function(x) {
x = do.call("rbind", x)
x = as.data.frame(x)
x$Plate = factor(x$Plate)
x$Organelle = factor(x$Organelle, levels = c("Mitochondria", "Cytoplasm"))
x$Nutrient = factor(x$Nutrient, levels = c("Glucose", "Galactose", "Glycerol"))
rownames(x) = 1:nrow(x)
return(x)
})
res = do.call("rbind", res)
#-------------------------------------------------------------------------------------------------
# Summarizing mutiple mutants (i.e same gene) from the same or different plates
# Also dropping genes with 2 or more NA values
# For genes with just 1 NA value, replacing it with the median of the remaining 3 observed values
#-------------------------------------------------------------------------------------------------
res = res %>%
rowwise() %>%
mutate(Median_roGFP2_ratio = median(c(roGFP2_ratio_1, roGFP2_ratio_2, roGFP2_ratio_3, roGFP2_ratio_4), na.rm=T),
NA_per_row = sum(is.na(c(roGFP2_ratio_1, roGFP2_ratio_2, roGFP2_ratio_3, roGFP2_ratio_4)))) %>%
filter(NA_per_row < 2) %>%
ungroup() %>%
mutate_at(vars(starts_with("roGFO2_ratio_")),
function(x) ifelse(is.na(x), .$Median_roGFP2_ratio, x)) %>%
select(-NA_per_row) %>%
group_by(Organelle, Nutrient, Genes) %>%
top_n(n=1, abs(Median_roGFP2_ratio)) %>%
ungroup()
#-------------------------------------------------------------------------------------------------
# Compiling all replicates across organelles and nutrient conditions into a single matrix
#-------------------------------------------------------------------------------------------------
redoxMat = data.table(res[,c("Genes", "Nutrient", "Organelle", "roGFP2_ratio_1", "roGFP2_ratio_2", "roGFP2_ratio_3", "roGFP2_ratio_4")])
redoxMat = dcast(redoxMat, Genes ~ Organelle + Nutrient, fun.aggregate = function(x){x}, fill=NA,
value.var = c("roGFP2_ratio_1", "roGFP2_ratio_2", "roGFP2_ratio_3", "roGFP2_ratio_4"))
redoxMat = redoxMat[,c(1,
2,8,14,20,
3,9,15,21,
4,10,16,22,
5,11,17,23,
6,12,18,24,
7,13,19,25
)]
redoxMat = data.frame(redoxMat, stringsAsFactors = F )
rownames(redoxMat) = redoxMat$Genes
redoxMat = redoxMat[,-1]
#-------------------------------------------------------------------------------------------------
# Compiling the median values across organelles and nutrient conditions into a single matrix
#-------------------------------------------------------------------------------------------------
redoxMat_median = data.table(res[,c("Genes", "Nutrient", "Organelle", "Median_roGFP2_ratio")])
redoxMat_median = dcast(redoxMat_median, Genes ~ Organelle + Nutrient, fun.aggregate = function(x){x}, fill=NA,
value.var = "Median_roGFP2_ratio")
redoxMat_median = data.frame(redoxMat_median, stringsAsFactors = F )
rownames(redoxMat_median) = redoxMat_median$Genes
redoxMat_median = redoxMat_median[,-1]
#-------------------------------------------------------------------------------------------------
# Putting all the results into one
#-------------------------------------------------------------------------------------------------
res = list(redox_table = res, redox_replicates = redoxMat, redox_median = redoxMat_median)
rm(normDat, normDatReps, redoxMat, redoxMat_median)
return(res)
}
normDat.ctrNorm = normalizeScreenData(normMethod = "fracControl")
normDat.Znorm = normalizeScreenData(normMethod = "robustZ")Normalized data overview
We normalize the data as described above, below is the summary and the distribution of the normalized data -
- For per plate normalization based on plate specific control
Genes Plate Nutrient Organelle
Length:26376 102 : 564 Glucose :8798 Mitochondria:13159
Class :character 105 : 562 Galactose:9014 Cytoplasm :13217
Mode :character 107 : 555 Glycerol :8564
116 : 554
118 : 553
142 : 545
(Other):23043
roGFP2_ratio_1 roGFP2_ratio_2 roGFP2_ratio_3 roGFP2_ratio_4
Min. :0.000219 Min. :0.000227 Min. :0.000225 Min. :0.00022
1st Qu.:1.257943 1st Qu.:1.252091 1st Qu.:1.243790 1st Qu.:1.24571
Median :1.574910 Median :1.569008 Median :1.561554 Median :1.56213
Mean :1.630336 Mean :1.626620 Mean :1.623120 Mean :1.62213
3rd Qu.:1.939161 3rd Qu.:1.939125 3rd Qu.:1.935817 3rd Qu.:1.92914
Max. :6.984722 Max. :7.126663 Max. :7.204534 Max. :7.25000
NA's :177
Median_roGFP2_ratio
Min. :0.000227
1st Qu.:1.259919
Median :1.563700
Mean :1.625279
3rd Qu.:1.930953
Max. :6.225727
- For per plate normalization based on median organelle specific values
Genes Plate Nutrient Organelle
Length:27695 102 : 564 Glucose :9192 Mitochondria:13807
Class :character 105 : 564 Galactose:9246 Cytoplasm :13888
Mode :character 110 : 563 Glycerol :9257
116 : 561
113 : 559
114 : 559
(Other):24325
roGFP2_ratio_1 roGFP2_ratio_2 roGFP2_ratio_3
Min. :-14.18291 Min. :-14.180302 Min. :-17.29949
1st Qu.: -0.64326 1st Qu.: -0.674491 1st Qu.: -0.69362
Median : 0.03188 Median : 0.002007 Median : -0.02326
Mean : 0.05993 Mean : 0.035221 Mean : 0.01129
3rd Qu.: 0.69614 3rd Qu.: 0.681601 3rd Qu.: 0.66520
Max. : 27.78831 Max. : 27.774188 Max. : 28.68528
roGFP2_ratio_4 Median_roGFP2_ratio
Min. :-9.92714 Min. :-9.61836
1st Qu.:-0.67574 1st Qu.:-0.54478
Median :-0.01056 Median :-0.01878
Mean : 0.01211 Mean : 0.03033
3rd Qu.: 0.65572 3rd Qu.: 0.55489
Max. :27.81965 Max. :27.78125
NA's :255
- Number of mutants per condition
a = normDat.ctrNorm$redox_table
a = split(a$Genes, paste0(a$Nutrient, a$Organelle))
sapply(a, function(x) length(unique(x))) GalactoseCytoplasm GalactoseMitochondria GlucoseCytoplasm
4519 4495 4416
GlucoseMitochondria GlycerolCytoplasm GlycerolMitochondria
4382 4282 4282
- Similarity between the normalization strategies
plotList = vector("list", 6)
names(plotList) = c("Glucose-Cytoplasm", "Glucose-Mitochondria",
"Galactose-Cytoplasm", "Galactose-Mitochondria",
"Glycerol-Cytoplasm", "Glycerol-Mitochondria")
plotListTop = plotList
for(i in c("Glucose", "Galactose", "Glycerol"))
{
for(j in c("Cytoplasm", "Mitochondria"))
{
a = normDat.Znorm$redox_table[which(normDat.Znorm$redox_table$Nutrient == i & normDat.Znorm$redox_table$Organelle == j),]
a = a[, c("Genes", "Median_roGFP2_ratio")]
colnames(a) = c("Genes", "Znorm")
b = normDat.ctrNorm$redox_table[which(normDat.ctrNorm$redox_table$Nutrient == i & normDat.ctrNorm$redox_table$Organelle == j),]
b = b[, c("Genes", "Median_roGFP2_ratio")]
colnames(b) = c("Genes", "Ctrlnorm")
df = merge(a,b)
topZ = quantile(df$Znorm, probs = c(0.05, 0.95))
topC = quantile(df$Ctrlnorm, probs = c(0.05, 0.95))
df_top = df[which(df$Znorm < topZ[1] | df$Znorm > topZ[2] | df$Ctrlnorm < topC[1] | df$Ctrlnorm > topC[2]),]
id = paste(i,j,sep="-")
plotList[[id]] = ggscatter(df, x = "Znorm", y = "Ctrlnorm",
color = "black", shape = 20, size = 0.5, # Points color, shape and size
add = "reg.line", # Add regressin line
add.params = list(color = "blue", fill = "lightgray"), # Customize reg. line
conf.int = TRUE, # Add confidence interval
cor.coef = TRUE, # Add correlation coefficient. see ?stat_cor
cor.coeff.args = list(method = "spearman", label.x = -10, label.y = 3.5, label.sep = "\n"),
ggtheme = theme_classic(base_size = 8)) + labs(subtitle = id)
plotListTop[[id]] = ggscatter(df_top, x = "Znorm", y = "Ctrlnorm",
color = "black", shape = 20, size = 0.5, # Points color, shape and size
add = "reg.line", # Add regressin line
add.params = list(color = "blue", fill = "lightgray"), # Customize reg. line
conf.int = TRUE, # Add confidence interval
cor.coef = TRUE, # Add correlation coefficient. see ?stat_cor
cor.coeff.args = list(method = "spearman", label.x = -10, label.y = 3.5, label.sep = "\n"),
ggtheme = theme_classic(base_size = 8)) + labs(subtitle = id)
rm(a, b, df,df_top, topZ, topC, id)
}
rm(j)
}
rm(i)Plotting the correlation between the two normalization techniques for all matching mutants
plotList$`Glucose-Cytoplasm` + plotList$`Glucose-Mitochondria` + plotList$`Galactose-Cytoplasm` +
plotList$`Galactose-Mitochondria` + plotList$`Glycerol-Cytoplasm` + plotList$`Glycerol-Mitochondria` +
plot_layout(nrow = 3, ncol = 2)Plotting the correlation between the two normalization techniques using ONLY the top hits 5% (high/low) roGFP2 ratios from either normalization method
plotListTop$`Glucose-Cytoplasm` + plotListTop$`Glucose-Mitochondria` + plotListTop$`Galactose-Cytoplasm` +
plotListTop$`Galactose-Mitochondria` + plotListTop$`Glycerol-Cytoplasm` + plotListTop$`Glycerol-Mitochondria` +
plot_layout(nrow = 3, ncol = 2)Sample similarity - Dimensionality reduction
Next, we apply the dimensionality reduction method Multidimesional scaling (MDS) on the roGFP2 ratio values normalized by the plate control and Robust Z normalized to identify the major grouping of the yeast mutants based on their redox status across organelle and nutrient conditions. We distinctly see that the redox status is different between the organelles (also seen by density plots, higher in mitochondria compared to cytoplasm) and within the organelles the mutants group by the nutrient conditions.
getMDSdata = function(dat) {
mds = cmdscale(dist(t(dat)), eig = TRUE, k = 2)$points
colnames(mds) = c("MDS1", "MDS2")
col_anno = do.call("rbind", strsplit(rownames(mds), "_"))
col_anno = col_anno[, -c(1:3)]
colnames(col_anno) = c("Compartment", "Nutrient")
mds = data.frame(mds, col_anno)
}
mdsZ = getMDSdata(dat = normDat.Znorm$redox_replicates)
mdsC = getMDSdata(dat = normDat.ctrNorm$redox_replicates)
p1 = ggplot(mdsZ, aes(x = MDS1, y = MDS2)) + theme_classic(base_size = 8) + labs(subtitle = "Robust Z normalized") +
geom_point(aes(color = Nutrient, shape = Compartment), size = 2) + scale_color_manual(values = c("#fc8d62",
"#66c2a5", "#8da0cb"))
p2 = ggplot(mdsC, aes(x = MDS1, y = MDS2)) + theme_classic(base_size = 8) + labs(subtitle = "Plate control normalized") +
geom_point(aes(color = Nutrient, shape = Compartment), size = 2) + scale_color_manual(values = c("#fc8d62",
"#66c2a5", "#8da0cb"))
pA = p1 + p2 + plot_layout(guides = "collect")
p3 <- ggviolin(normDat.Znorm$redox_table, x = "Organelle", y = "Median_roGFP2_ratio",
xlab = "", ylab = "Median roGFP2 ratio", draw_quantiles = 0.5, fill = "Organelle",
facet.by = "Nutrient", color = "grey90") + stat_compare_means(label = "p.format",
method = "wilcox", cex = 2) + labs(subtitle = "Robust Z normalized") + theme_classic(base_size = 8) +
theme(legend.position = "right", axis.text.x = element_blank(), axis.ticks.x = element_blank()) +
scale_fill_manual(values = c("#b2df8a", "#7570b3"))
p4 <- ggviolin(normDat.ctrNorm$redox_table, x = "Organelle", y = "Median_roGFP2_ratio",
xlab = "", ylab = "Median roGFP2 ratio", draw_quantiles = 0.5, fill = "Organelle",
facet.by = "Nutrient", color = "grey90") + stat_compare_means(label = "p.format",
method = "wilcox", cex = 2) + labs(subtitle = "Plate control normalized") + theme_classic(base_size = 8) +
theme(legend.position = "right", axis.text.x = element_blank(), axis.ticks.x = element_blank()) +
scale_fill_manual(values = c("#b2df8a", "#7570b3"))
pB = p3 + p4 + plot_layout(guides = "collect")
p5 = ggviolin(normDat.Znorm$redox_table, x = "Nutrient", y = "Median_roGFP2_ratio",
xlab = "", ylab = "Median roGFP2 ratio", draw_quantiles = 0.5, fill = "Nutrient",
facet.by = "Organelle", color = "grey90") + stat_compare_means(method = "anova",
label.y = 22, label.x = 1.5, cex = 2) + stat_compare_means(label = "p.signif",
method = "wilcox", ref.group = ".all.", hide.ns = TRUE) + scale_fill_manual(values = c("#66c2a5",
"#fc8d62", "#8da0cb")) + labs(subtitle = "Robust Z normalized") + theme_classic(base_size = 8) +
theme(legend.position = "right", axis.text.x = element_blank(), axis.ticks.x = element_blank())
p6 = ggviolin(normDat.ctrNorm$redox_table, x = "Nutrient", y = "Median_roGFP2_ratio",
xlab = "", ylab = "Median roGFP2 ratio", draw_quantiles = 0.5, fill = "Nutrient",
facet.by = "Organelle", color = "grey90") + stat_compare_means(method = "anova",
label.y = 5, label.x = 1.5, cex = 2) + stat_compare_means(label = "p.signif",
method = "wilcox", ref.group = ".all.", hide.ns = TRUE) + scale_fill_manual(values = c("#66c2a5",
"#fc8d62", "#8da0cb")) + labs(subtitle = "Plate control normalized") + theme_classic(base_size = 8) +
theme(legend.position = "right", axis.text.x = element_blank(), axis.ticks.x = element_blank())
pC = p5 + p6 + plot_layout(guides = "collect")
p = pA/pB/pC
ggsave(filename = paste0(path, "analysis/normalization/roGFP2_ratio_comparison_nutrient_compartments.pdf"),
plot = p, width = 7, height = 7)
pReplicate similarity - Correlation
Below we show the correlation among replicates.
- first with plate median (nutrient specific) based Robust Z normalization
c1 = cor(normDat.Znorm$redox_replicates, method = "spearman", use = "pairwise.complete.obs")
c2 = cor(normDat.ctrNorm$redox_replicates, method = "spearman", use = "pairwise.complete.obs")
if (identical(colnames(c1), colnames(c2))) {
col_anno = do.call("rbind", strsplit(colnames(c1), "_"))
col_anno = col_anno[, -c(1:3)]
colnames(col_anno) = c("Compartment", "Nutrient")
rownames(col_anno) = colnames(c1)
col_anno = data.frame(col_anno, stringsAsFactors = F)
}
colr = list(Compartment = c(Mitochondria = "#b2df8a", Cytoplasm = "#7570b3"), Nutrient = c(Glucose = "#66c2a5",
Galactose = "#fc8d62", Glycerol = "#8da0cb"))
c1[c1 == 1] = NA
c2[c2 == 1] = NA
pheatmap(c1, clustering_method = "ward.D2", color = brewer.pal(n = 9, "Greys"), show_rownames = F,
show_colnames = F, border_color = "white", na_col = "white", annotation_row = col_anno,
annotation_col = col_anno, annotation_colors = colr, fontsize = 8, main = "Robust Z normalize")pheatmap(c1, clustering_method = "ward.D2", color = brewer.pal(n = 9, "Greys"), show_rownames = F,
show_colnames = F, border_color = "white", na_col = "white", annotation_row = col_anno,
annotation_col = col_anno, annotation_colors = colr, fontsize = 8, filename = paste0(path,
"analysis/normalization/replicate_correlation_normZnorm.pdf"), width = 5,
height = 5)- second with control based normalization
pheatmap(c2, clustering_method = "ward.D2", color = brewer.pal(n = 9, "Greys"), show_rownames = F,
show_colnames = F, border_color = "white", na_col = "white", annotation_row = col_anno,
annotation_col = col_anno, annotation_colors = colr, fontsize = 8, main = "Control normalize")pheatmap(c2, clustering_method = "ward.D2", color = brewer.pal(n = 9, "Greys"), show_rownames = F,
show_colnames = F, border_color = "white", na_col = "white", annotation_row = col_anno,
annotation_col = col_anno, annotation_colors = colr, fontsize = 8, filename = paste0(path,
"analysis/normalization/replicate_correlation_normCtr.pdf"), width = 5, height = 5)Distribution of normalized data
- For per plate normalization based on plate specific control and an interactive data table to access the data
p = ggplot(normDat.ctrNorm$redox_table) + theme_bw(base_size = 8) +
geom_boxplot(aes(x = Plate, y = Median_roGFP2_ratio), outlier.size = 0.1, lwd=0.2) + #ylim(-10, 10) +
facet_grid(Nutrient ~ Organelle) +
theme(axis.text.x = element_text(angle = 90, hjust = 0.5, vjust = 0.5)) +
geom_text_repel(data = subset(normDat.ctrNorm$redox_table, abs(Median_roGFP2_ratio) > 3), aes(x = Plate, y = Median_roGFP2_ratio, label = Genes), size = 2)
ggsave(filename = paste0(path,"analysis/normalization/roGFP2_ratio_distribution_PlateControl_normalized.pdf"),
plot = p, width = 10, height = 15)
ptmp = normDat.ctrNorm$redox_table
tmp[,5:9] = round(tmp[,5:9],3)
datatable(tmp, rownames = FALSE, filter="top", class="compact",
extensions = c('Buttons') ,
options = list(autoWidth = TRUE,
dom = 'Bfrtip',
buttons = c('csv', 'excel')
))- For per plate normalization based on median organelle specific values and an interactive data table to access the data
p = ggplot(normDat.Znorm$redox_table) + theme_bw(base_size = 9) + #geom_hline(yintercept = c(-5,5)) +
geom_boxplot(aes(x = Plate, y = Median_roGFP2_ratio), outlier.size = 0.1, lwd = 0.2) + #ylim(-10, 10) +
facet_grid(Nutrient ~ Organelle) +
theme(axis.text.x = element_text(angle = 90, hjust = 0.5, vjust = 0.5)) +
geom_text_repel(data = subset(normDat.Znorm$redox_table, Median_roGFP2_ratio > 5 | Median_roGFP2_ratio < -5), aes(x = Plate, y = Median_roGFP2_ratio, label = Genes), size = 2)
ggsave(filename = paste0(path,"analysis/normalization/roGFP2_ratio_distribution_RobustZ_normalized.pdf"),
plot = p, width = 10, height = 10)
ptmp = normDat.Znorm$redox_table
tmp[,5:9] = round(tmp[,5:9],3)
datatable(tmp, rownames = FALSE, filter="top", class="compact",
extensions = c('Buttons') ,
options = list(autoWidth = TRUE,
dom = 'Bfrtip',
buttons = c('csv', 'excel')
))Saving the data
Finally we save the normalized data as a .RDS (R data object) and .xlsx excel data file. All our follow up downstream analysis will start from these normalized data.
saveRDS(normDat.ctrNorm, paste0(path, "data/workspaces/YeastMutantRedox_NormalizedData_PlateControl.RDS"))
saveRDS(normDat.Znorm, paste0(path, "data/workspaces/YeastMutantRedox_NormalizedData_RobustZ.RDS"))
WriteXLS(normDat.ctrNorm$redox_table, ExcelFileName = paste0(path, "analysis/supplementary/tables/YeastMutantRedox_NormalizedData_PlateControl.xlsx"),
AdjWidth = TRUE, BoldHeaderRow = TRUE, FreezeRow = 1)
WriteXLS(normDat.Znorm$redox_table, ExcelFileName = paste0(path, "analysis/supplementary/tables/YeastMutantRedox_NormalizedData_RobustZ.xlsx"),
AdjWidth = TRUE, BoldHeaderRow = TRUE, FreezeRow = 1)Session information
R version 3.6.2 (2019-12-12)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS 10.16
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] RColorBrewer_1.1-2 pheatmap_1.0.12 patchwork_1.0.0 ggpubr_0.2.4
[5] magrittr_1.5 ggrepel_0.8.1 WriteXLS_5.0.0 data.table_1.12.8
[9] forcats_0.4.0 stringr_1.4.0 dplyr_0.8.4 purrr_0.3.3
[13] readr_1.3.1 tidyr_1.0.2 tibble_2.1.3 ggplot2_3.2.1
[17] tidyverse_1.3.0 DT_0.12 rmdformats_0.3.6 knitr_1.28
loaded via a namespace (and not attached):
[1] httr_1.4.1 jsonlite_1.6.1 modelr_0.1.5 shiny_1.4.0
[5] assertthat_0.2.1 cellranger_1.1.0 yaml_2.2.1 pillar_1.4.3
[9] backports_1.1.5 lattice_0.20-38 glue_1.3.1 digest_0.6.23
[13] promises_1.1.0 ggsignif_0.6.0 rvest_0.3.5 colorspace_1.4-1
[17] htmltools_0.4.0 httpuv_1.5.5 plyr_1.8.5 pkgconfig_2.0.3
[21] broom_0.5.4 haven_2.2.0 bookdown_0.17 xtable_1.8-4
[25] scales_1.1.0 later_1.0.0 generics_0.0.2 farver_2.0.3
[29] ellipsis_0.3.0 withr_2.1.2 lazyeval_0.2.2 cli_2.0.1
[33] mime_0.9 crayon_1.3.4 readxl_1.3.1 evaluate_0.14
[37] fs_1.3.1 fansi_0.4.1 nlme_3.1-144 xml2_1.2.2
[41] tools_3.6.2 hms_0.5.3 formatR_1.7 lifecycle_0.1.0
[45] munsell_0.5.0 reprex_0.3.0 compiler_3.6.2 rlang_0.4.4
[49] grid_3.6.2 rstudioapi_0.11 htmlwidgets_1.5.1 crosstalk_1.0.0
[53] labeling_0.3 rmarkdown_2.1 gtable_0.3.0 DBI_1.1.0
[57] reshape2_1.4.3 R6_2.4.1 lubridate_1.7.4 fastmap_1.0.1
[61] stringi_1.4.5 Rcpp_1.0.3 vctrs_0.2.2 dbplyr_1.4.2
[65] tidyselect_1.0.0 xfun_0.12